Data Visualization

Tools Used: ggplot, plotly, Tableau

For this section, we will visualize each dataset outlined in the Data Sources tab, allowing us to better understand the nature of each area related to public transit usage. This will be helpful for identifying trends, seasonality, and cycles, each of which will be further analyzed in the EDA section.

American Public Transit Ridership

Code
#install.packages("tidyverse")
library(tidyverse)
library(ggplot2)
library(forecast)
library(astsa) 
library(xts)
library(tseries)
library(fpp2)
library(fma)
library(lubridate)
library(tidyverse)
library(TSstudio)
library(quantmod)
library(tidyquant)
library(plotly)
library(dplyr)
ridership <- read_csv("../data/ridership.csv")
Code
library(plotly)
fig <- ridership %>%
  plot_ly(
    x = ~`Quarter-Year`,
    y = ~`Total Ridership (000s)`,
    type = 'scatter',
    mode = 'lines',
    showlegend = F
  )%>% plotly::layout(title = list(text="Quarterly Total Public Transit Ridership"),
    xaxis = list(title="Date"),
    yaxis = list(title="Public Transit Trips (000s)"))

fig

This graph gives a good general overview of public transit usage before, during, and after the pandemic. Usage appeared to be moving relatively positively, although there is a noticeable decline even before 2020 which indicates that a reduction may have been occurring independent of COVID-19. However, this pales in comparison to the drastic drop in 2020 and subsequent rebound, of which we still likely haven’t seen the end. Finally, there may be a seasonality aspect, as Q1 always appears to be the lowest value in its cluster.

Code
library(dplyr)
recent_ridership <- ridership %>% filter(Year > 2015)

fig <- plot_ly(recent_ridership, x = ~`Quarter-Year`, y = ~`Heavy Rail (000s)`, type = 'bar', name = "Heavy Rail") %>% 
  add_trace(y = ~`Light Rail (000s)`, name = "Light Rail") %>% 
  add_trace(y = ~`Commuter Rail (000s)`, name = "Commuter Rail") %>% 
  add_trace(y = ~`Trolleybus (000s)`, name = "Trolleybus") %>% 
  add_trace(y = ~`Bus (000s)`, name = "Bus") %>% 
  add_trace(y = ~`Demand Response (000s)`, name = "Demand Response") %>% 
  add_trace(y = ~`Other (000s)`, name = "Other") %>% 
  layout(yaxis = list(title = 'Public Transit Trips (000s)'), barmode = 'stack',
         xaxis = list(title = 'Date'),
         title = list(text = 'Ridership by Transit Type since 2015'))

fig

This stacked bar chart simply shows the share that each transit type has in contributing to the first graph. The main takeaways are that heavy rails and buses are by far the most used types, and that as total usage increases/decreases, each transit type’s usage increases/decreases accordingly. There does not appear to be a type that thrives or diminishes at an opposite rate to the others.

Code
#install.packages("ggcorrplot")
library(ggcorrplot)
library(ggplot2)
corr_matrix <- cor(ridership[c(5:11)], use = "complete.obs")
ggcorrplot(corr_matrix, 
           method = "square",
           type = "full",
           lab = TRUE) + 
  ggtitle("Correlation Matrix of Public Transit Types") +
  theme(plot.title = element_text(hjust = 0.5))

This shows what we surmised in the previous graph, that transit types are mostly positively correlated with one another, and heavily so. Buses and trolleybuses appear to be less correlated with the rest, but it is reasonable to generalize that as one transit type’s usage increases, so will the rest.

City-Specific Public Transit Ridership

Code
mta <- read_csv("../code/MTA_ridership.csv")
mta$Date<-as.Date(mta$Date,"%m/%d/%Y")
mta <- mta[order(mta$Date),]
head(mta)
Code
mta$total_ridership <- mta$`Subways: Total Estimated Ridership` + mta$`Buses: Total Estimated Ridership` + mta$`LIRR: Total Estimated Ridership` + mta$`Metro-North: Total Estimated Ridership` + mta$`Staten Island Railway: Total Estimated Ridership`
mta$weekly_ma <- ma(mta$`total_ridership`, 7)
Code
fig <- mta %>%
  plot_ly(
    x = ~Date,
    y = ~`total_ridership`,
    type = 'scatter',
    mode = 'lines',
    name = "Daily Ridership",
    showlegend = T
  )%>%
  add_lines(y = ~weekly_ma, name = "7-day Moving Average", line = list(color = 'red')) %>% 
  layout(
      title = list(text="MTA: Daily Total Estimated Ridership"),
      xaxis = list(title="Date"),
        yaxis = list(title="Total Daily Public Transit Trips"))

fig

This gives a more granular overview of the usage decline and rebound in New York City, as we can see the daily values. The moving average is easier to decipher, and shows a trend similar to the general data for all of the United States. However, it is possible that the recovery is tapering off a bit. One thing to note is that there are drastically different values even within a small date range, suggesting that there is a better way to split up this data. Thus, the next graph will split by weekday and weekend.

Code
mta$day <- weekdays(as.Date(mta$Date))
mta <- mta %>%
  mutate(day_type = case_when(
    day %in% c("Saturday", "Sunday") ~ "Weekend",
    TRUE ~ "Weekday"
  ))
head(mta)
Code
fig <- mta %>%
  plot_ly(
    x = ~Date,
    y = ~`total_ridership`,
    group = ~day_type,
    color = ~day_type,
    type = 'scatter',
    mode = 'lines',
    showlegend = T
  )%>%
  layout(
      title = list(text="MTA: Daily Total Estimated Ridership (Split by Weekends and Weekdays)"),
      xaxis = list(title="Date"),
        yaxis = list(title="Total Daily Public Transit Trips"))
Warning in plot_ly(., x = ~Date, y = ~total_ridership, group = ~day_type, : The group argument has been deprecated. Use `group_by()` or split instead.
See `help('plotly_data')` for examples
Code
fig
Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

This confirms the suspicion that a great deal of the variation is simply due to the day of week. Additionally, many of the drops in the weekdays line come on federal holidays. With working weekdays being significantly higher in ridership, we can presume that a great deal of usage comes from the need to commute to work and school, at least in New York City.

Code
cta <- read_csv("../code/CTA_ridership.csv")
cta$Date<-as.Date(cta$service_date,"%m/%d/%Y")
cta$day <- weekdays(as.Date(cta$Date))
cta <- cta %>%
  mutate(type = case_when(
    day %in% c("Saturday", "Sunday") ~ "Weekend",
    TRUE ~ "Weekday"
  ))
cta$weekly_ma <- ma(cta$`total_rides`, 30)
cta <- cta[order(cta$Date),]
cta <- cta %>% filter(Date >= as.Date("2015-01-01"))
head(cta)
Code
fig <- cta %>%
  plot_ly(
    x = ~Date,
    y = ~`total_rides`,
    type = 'scatter',
    mode = 'lines',
    name = "Daily Ridership",
    showlegend = T
  )%>%
  add_lines(y = ~weekly_ma, name = "30-day Moving Average", line = list(color = 'red')) %>% 
  layout(
      title = list(text="CTA: Daily Total Estimated Ridership"),
      xaxis = list(title="Date"),
        yaxis = list(title="Total Daily Public Transit Trips"))

fig

In Chicago we see an unmistakable steady decline in usage prior to the pandemic. The recovery appears to be somewhat slow, which is certainly something to analyze in future sections. It is possible that some of the struggles go beyond the impact of the pandemic.

Code
fig <- cta %>%
  plot_ly(
    x = ~Date,
    y = ~`total_rides`,
    group = ~type,
    color = ~type,
    type = 'scatter',
    mode = 'lines',
    showlegend = T
  )%>%
  layout(
      title = list(text="CTA: Daily Total Estimated Ridership (Split by Weekends and Weekdays)"),
      xaxis = list(title="Date"),
        yaxis = list(title="Total Daily Public Transit Trips"))
Warning in plot_ly(., x = ~Date, y = ~total_rides, group = ~type, color = ~type, : The group argument has been deprecated. Use `group_by()` or split instead.
See `help('plotly_data')` for examples
Code
fig
Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Finally, this tells the same story as the split graph in New York City. Weekdays consistently produce higher ridership, indicating that commutes to work and school represent a great share of total usage.

Monthly Ridership

To get a more specific view of how and why public transit ridership trends occur, we will now take a look at data that breaks it down by city, accounting for all metropolitan areas in the U.S. where applicable. The following plot simply shows total numbers by city for the most recent time step to show us which cities are most impacted by the industry. We will then use this data to decide where to focus our analysis.

Code
cities <- read_csv("../data/cities_ridership.csv")
cities <- cities[, c(7, 11:288)]
cities[is.na(cities)] <- 0
cities_agg <- cities %>%
  group_by(`UZA Name`) %>%
  summarise(across(everything(), sum, na.rm = TRUE))
head(cities_agg, 50)
Code
vrm <- read_csv("../data/cities_miles.csv")
vrm <- vrm[, c(7, 11:288)]
vrm[is.na(vrm)] <- 0
vrm_agg <- vrm %>%
  group_by(`UZA Name`) %>%
  summarise(across(everything(), sum, na.rm = TRUE))
head(vrm_agg)
Code
upt_subset <- cities_agg %>%
  dplyr::select(`UZA Name`, `Feb-25`) %>%
  dplyr::rename(upt = `Feb-25`)

vrm_subset <- vrm_agg %>%
  dplyr::select(`UZA Name`, `Feb-25`) %>%
  dplyr::rename(vrm = `Feb-25`)

# Join them by City
combined_df <- inner_join(upt_subset, vrm_subset, by = "UZA Name")
combined_df$upt <- log(combined_df$upt)
combined_df$vrm <- log(combined_df$vrm)

# Create scatter plot
plot_ly(combined_df, 
        x = ~upt, 
        y = ~vrm, 
        type = 'scatter', 
        mode = 'markers',
        text = ~`UZA Name`,
        marker = list(size = 10)) %>%
  layout(title = "City Comparison for February 2025",
         xaxis = list(title = "Log(Unlinked Passenger Trips)"),
         yaxis = list(title = "Log(Vehicle Ridership Miles)"))

This plot shows that vehicle miles and total passenger trips are heavily correlated, which is unsurprising. Due to the gap that occurs after the top nine cities in both categories, our city-by-city analyses will focus only on these nine cities:

  • New York, NY
  • Los Angeles, CA
  • Chicago, IL
  • Washington, DC
  • San Francisco, CA
  • Philadelphia, PA
  • Boston, MA
  • Seattle, WA
  • Miami, FL

Interestingly, this is not the list of the most populous cities in the U.S., indicating that there are great disparities in public transit usage beyond just volume or population density.

Code
upt_long <- cities_agg %>%
  pivot_longer(-`UZA Name`, names_to = "Month", values_to = "UPT")

vrm_long <- vrm_agg %>%
  pivot_longer(-`UZA Name`, names_to = "Month", values_to = "VRM")

combined_long <- full_join(upt_long, vrm_long, by = c("UZA Name", "Month"))

combined_long_clean <- combined_long %>%
  group_by(`UZA Name`, Month) %>%
  summarise(
    UPT = mean(UPT, na.rm = TRUE),
    VRM = mean(VRM, na.rm = TRUE),
    .groups = "drop"
  )

combined_wide <- combined_long_clean %>%
  pivot_wider(
    names_from = `UZA Name`,
    values_from = c(UPT, VRM),
    names_sep = " - "
  ) %>%
  dplyr::select(Month, everything())

head(combined_wide)
#write_csv(combined_wide, "../data/monthly_data.csv")
Code
combined_wide$Month <- as.Date(paste0(combined_wide$Month, "-01"), format = "%b-%y-%d")
write_csv(combined_wide, "../data/monthly_data.csv")
Code
plot <- plot_ly(combined_wide, x = ~Month) %>%
  add_lines(y = ~`UPT - New York--Jersey City--Newark, NY--NJ`, name = "New York, NY", line = list(color = 'darkgreen')) %>%
  add_lines(y = ~`UPT - Los Angeles--Long Beach--Anaheim, CA`, name = "Los Angeles, CA", line = list(color = 'orange')) %>%
  add_lines(y = ~`UPT - Chicago, IL--IN`, name = "Chicago, IL", line = list(color = '#cccc00')) %>%
  add_lines(y = ~`UPT - Washington--Arlington, DC--VA--MD`, name = "Washington, DC", line = list(color = 'green')) %>%
  add_lines(y = ~`UPT - San Francisco--Oakland, CA`, name = "San Francisco", line = list(color = 'blue')) %>%
  add_lines(y = ~`UPT - Philadelphia, PA--NJ--DE--MD`, name = "Philadelphia, PA", line = list(color = 'purple')) %>%
  add_lines(y = ~`UPT - Boston, MA--NH`, name = "Boston, MA", line = list(color = 'violet')) %>%
  add_lines(y = ~`UPT - Seattle--Tacoma, WA`, name = "Seattle, WA", line = list(color = 'brown')) %>%
  add_lines(y = ~`UPT - Miami--Fort Lauderdale, FL`, name = "Miami, FL", line = list(color = 'cyan')) %>%
  layout(
    title = "Public Transit Monthly Ridership by City",
    xaxis = list(title = "Date"),
    yaxis = list(title = "Ridership (Total Unlinked Passenger Trips)"),
    legend = list(title = list(text = "Legend"))
  )

plot

The most obvious aspect of this plot is the great distance between ridership in New York City and all other cities. While New York is certainly the most populous and dense city in the country, this disparity far exceeds that, which tells us that there are characteristics there that are not emulated in other urban areas. The other important thing to note is that all of these cities were greatly impacted in 2020, telling us that the overall trend applies everywhere with no notable exception here. However, the differences in how ridership was trending before and after the pandemic will be important to analyze.

Rideshare Usage

Code
library(lubridate)
uber <- read_csv("../code/uber.csv")

uber <- uber %>%
  mutate(
    year = as.numeric(paste0("20", sub(".*'(\\d+)", "\\1", Date))),
    month = case_when(
      grepl("Q1", Date) ~ 1,
      grepl("Q2", Date) ~ 4,
      grepl("Q3", Date) ~ 7,
      grepl("Q4", Date) ~ 10
    ),
    date = make_date(year, month, 1)
  )
head(uber)
write_csv(uber, "../data/uber_riders.csv")
Code
fig <- uber %>% plot_ly(
  x = ~date,
  y = ~Rides,
  type = "bar"
)%>%
  layout(
      title = list(text="Quarterly Active Users of Uber - Worldwide"),
      xaxis = list(title="Date"),
        yaxis = list(title="Active Users"))

fig
Code
lyft <- read_csv("../code/lyft.csv")
lyft <- lyft %>%
  mutate(
    year = as.numeric(paste0("20", sub(".*'(\\d+)", "\\1", Date))),
    month = case_when(
      grepl("Q1", Date) ~ 1,
      grepl("Q2", Date) ~ 4,
      grepl("Q3", Date) ~ 7,
      grepl("Q4", Date) ~ 10
    ),
    date = make_date(year, month, 1)
  )
head(lyft)
write_csv(lyft, "../data/lyft_riders.csv")
Code
fig <- lyft %>% plot_ly(
  x = ~date,
  y = ~Rides,
  type = "bar"
)%>%
  layout(
      title = list(text="Quarterly Active Users of Lyft - North America"),
      xaxis = list(title="Date"),
        yaxis = list(title="Active Users"))

fig

These bar graphs, while they differ in scope due to Uber’s reporting of worldwide data and Lyft’s reporting of North American data, show a similar story to one another. Both services saw significant increases throughout the late 2010s. During the pandemic there was a drop in users just as there was for public transit. However, each service’s active user base has already returned to values close to or above their pre-pandemic values, which is a phenomenon not seen in any of our public transit data.

Vehicle Usage

Code
cars <- read_csv("../data/monthly_miles.csv")
cars$observation_date <- as.Date(cars$observation_date, format = "%m/%d/%y")
cars <- cars[cars$observation_date >= as.Date("1993-04-01"),]
cars <- cars[cars$observation_date <= as.Date("2024-01-01"),]
fig <- cars %>%
  plot_ly(
    x = ~observation_date,
    y = ~M12MTVUSM227NFWA,
    type = 'scatter',
    mode = 'lines',
    showlegend = F
  )%>% layout(
      title = list(text="Monthly Vehicle Miles Driven"),
      xaxis = list(title="Year"),
        yaxis = list(title="Vehicle Miles (Millions)"))

fig

This plot shows a relatively consistent increase in vehicle miles driven for the majority of years, but with dips in 2008 and 2020. From what we know about these time period, this indicates that this statistic is heavily impacted by economic factors, much like public transit ridership. This will be valuable to include in multivariate analysis.

Financial Data

Code
uber<-getSymbols("UBER",auto.assign = FALSE, from = "2020-01-01") 

uber=data.frame(uber)
uber <- data.frame(uber,rownames(uber))
colnames(uber)[7] = "date"

uber$date<-as.Date(uber$date,"%Y-%m-%d")

close=uber$UBER.Close

uber_ts=ts(close, start=decimal_date(as.Date("2020-01-01")), frequency = 252)
Code
lyft<-getSymbols("LYFT",auto.assign = FALSE, from = "2020-01-01") 

lyft=data.frame(lyft)
lyft <- data.frame(lyft,rownames(lyft))
colnames(lyft)[7] = "date"

lyft$date<-as.Date(lyft$date,"%Y-%m-%d")

close=lyft$LYFT.Close

lyft_ts=ts(close, start=decimal_date(as.Date("2020-01-01")), frequency = 252)
Code
rideshare_data <- data.frame(Date = uber$date, Uber = uber$UBER.Close, Lyft = lyft$LYFT.Close)
head(rideshare_data)
Code
plot <- plot_ly(rideshare_data, x = ~Date) %>%
  add_lines(y = ~Uber, name = "Uber Closing Price", line = list(color = 'black')) %>%
  add_lines(y = ~Lyft, name = "Lyft Closing Price", line = list(color = '#cc00bb')) %>%
  layout(
    title = "Rideshare Stock prices",
    xaxis = list(title = "Date"),
    yaxis = list(title = "Price"),
    legend = list(title = list(text = "Legend"))
  )

# Display the plot
plot

This plot tells us that there’s a great disparity in the financial performance of two of the top rideshare service companies in the last three years. Both Uber and Lyft saw sharp dips in stock price during the height of the pandemic, followed by a rise in early 2021, presumably due to the return of commuters requiring transportation again. So far, this matches up well with the rideshare usage plot seen above. However since 2022, Lyft’s stock price has reached values lower than even in 2020 and remained there somewhat consistently, while Uber’s stock price continues an upward trend.

Code
zoom<-getSymbols("ZM",auto.assign = FALSE, from = "2020-01-01") 

zoom=data.frame(zoom)
zoom <- data.frame(zoom,rownames(zoom))
colnames(zoom)[7] = "date"

zoom$date<-as.Date(zoom$date,"%Y-%m-%d")

close=zoom$ZM.Close

zoom_ts=ts(close, start=decimal_date(as.Date("2020-01-01")), frequency = 252)
Code
microsoft<-getSymbols("MSFT",auto.assign = FALSE, from = "2020-01-01") 

microsoft=data.frame(microsoft)
microsoft <- data.frame(microsoft,rownames(microsoft))
colnames(microsoft)[7] = "date"

microsoft$date<-as.Date(microsoft$date,"%Y-%m-%d")

close=microsoft$MSFT.Close

microsoft_ts=ts(close, start=decimal_date(as.Date("2020-01-01")), frequency = 252)
Code
cisco<-getSymbols("CSCO",auto.assign = FALSE, from = "2020-01-01") 

cisco=data.frame(cisco)
cisco <- data.frame(cisco,rownames(cisco))
colnames(cisco)[7] = "date"

cisco$date<-as.Date(cisco$date,"%Y-%m-%d")

close=cisco$CSCO.Close

cisco_ts=ts(close, start=decimal_date(as.Date("2020-01-01")), frequency = 252)
Code
telecom_data <- data.frame(Date = cisco$date, Zoom = zoom$ZM.Close, Microsoft = microsoft$MSFT.Close, Cisco = cisco$CSCO.Close)
head(telecom_data)
Code
plot <- plot_ly(telecom_data, x = ~Date) %>%
  add_lines(y = ~Zoom, name = "Zoom Closing Price", line = list(color = 'blue')) %>%
  add_lines(y = ~Microsoft, name = "Microsoft Closing Price", line = list(color = '#008800')) %>%
  add_lines(y = ~Cisco, name = "Cisco Closing Price", line = list(color = '#bb0000')) %>%
  layout(
    title = "Telecommunications Stock prices",
    xaxis = list(title = "Date"),
    yaxis = list(title = "Price"),
    legend = list(title = list(text = "Legend"))
  )

# Display the plot
plot

Somewhat similar to the rideshare plot, there is not much consistency between companies here. While Cisco’s stock price appears to consistently lag behind the others, and Microsoft has an upward trend with several interruptions, Zoom stock tells a very interesting story. Following the pandemic, this price skyrocketed to heights more than double its competitors, only to fall back to being consistenly under 100 dollars by 2022. It is important to note that these companies are not all equally associated with video conferencing in the way that Zoom is. Cisco and Microsoft’s stock are likely more related to their entire set of offerings, explaining why they are not as volatile.

Code
bp<-getSymbols("BP",auto.assign = FALSE, from = "2020-01-01") 

bp=data.frame(bp)
bp <- data.frame(bp,rownames(bp))
colnames(bp)[7] = "date"

bp$date<-as.Date(bp$date,"%Y-%m-%d")
Code
shell<-getSymbols("SHEL",auto.assign = FALSE, from = "2020-01-01") 

shell=data.frame(shell)
shell <- data.frame(shell,rownames(shell))
colnames(shell)[7] = "date"

shell$date<-as.Date(shell$date,"%Y-%m-%d")
Code
exxon<-getSymbols("XOM",auto.assign = FALSE, from = "2020-01-01") 

exxon=data.frame(exxon)
exxon <- data.frame(exxon,rownames(exxon))
colnames(exxon)[7] = "date"

exxon$date<-as.Date(exxon$date,"%Y-%m-%d")
Code
chevron<-getSymbols("CVX",auto.assign = FALSE, from = "2020-01-01") 

chevron=data.frame(chevron)
chevron <- data.frame(chevron,rownames(chevron))
colnames(chevron)[7] = "date"

chevron$date<-as.Date(chevron$date,"%Y-%m-%d")
Code
oil_data <- data.frame(Date = chevron$date, BP = bp$BP.Close, Shell = shell$SHEL.Close, Exxon = exxon$XOM.Close, Chevron = chevron$CVX.Close)
head(oil_data)
Code
plot <- plot_ly(oil_data, x = ~Date) %>%
  add_lines(y = ~BP, name = "BP Closing Price", line = list(color = '#66cc00')) %>%
  add_lines(y = ~Shell, name = "Shell Closing Price", line = list(color = '#cccc00')) %>%
  add_lines(y = ~Exxon, name = "Exxon Closing Price", line = list(color = '#bb0000')) %>%
  add_lines(y = ~Chevron, name = "Chevron Closing Price", line = list(color = '#0000bb')) %>%
  layout(
    title = "Oil Stock prices",
    xaxis = list(title = "Date"),
    yaxis = list(title = "Price"),
    legend = list(title = list(text = "Legend"))
  )

# Display the plot
plot

Unlike the other industries, all four of these stock prices seem to rise and fall with one another. This says a great deal about the homogeneity of the oil industry. In this case, all four prices took a predictable hit in 2020, followed by a generally gradual climb thereafter. There is certainly less volatility here than in other industries.

Gas Prices

Code
gas <- read_csv("../data/gas_prices.csv")
gas$Month <- as.Date(paste0(gas$Month, "-01"), format = "%b-%y-%d")
gas <- gas[gas$Month >= as.Date("1993-04-01"),]
gas <- gas[gas$Month <= as.Date("2024-01-01"),]
gas <- gas[order(gas$Month), ]
Code
fig <- gas %>%
  plot_ly(
    x = ~Month,
    y = ~`Price per Gallon`,
    type = 'scatter',
    mode = 'lines',
    showlegend = F
  )%>% layout(
      title = list(text="Monthly Average Gas Prices"),
      xaxis = list(title="Year"),
        yaxis = list(title="Average Price (Dollars)"))

fig

This plot of average monthly gas prices will serve as an indicator for the viablity of travel via automobile as an alternative to public transit methods. What we see here is that gas prices often correlate to larger macroeconomic trends, as prices dip during recessions and skyrocket during times of inflation and supply chain mishaps.

Unemployment Rates

Code
unemp <- read_csv("../data/unemployment.csv")
unemp <- unemp[unemp$observation_date >= as.Date("1993-04-01"),]
unemp <- unemp[unemp$observation_date <= as.Date("2024-01-01"),]
Code
fig <- unemp %>%
  plot_ly(
    x = ~observation_date,
    y = ~UNRATE,
    type = 'scatter',
    mode = 'lines',
    showlegend = F
  )%>% layout(
      title = list(text="Monthly Unemployment Rate"),
      xaxis = list(title="Year"),
        yaxis = list(title="Unemployment Rate"))

fig

Finally, the unemployment rate is clearly heavily tied to larger macroeconomic trends, as well. We see large spikes in 2008 and 2020, times of great recession. Meanwhile, most of the time, unemployment rates gradually decrease in the absence of major economic events. This will provide information on the necessity to use public transit as a means of commuting to work.

Code
write_csv(uber, "../data/uber.csv")
write_csv(lyft, "../data/lyft.csv")
write_csv(zoom, "../data/zoom.csv")
write_csv(microsoft, "../data/microsoft.csv")
write_csv(cisco, "../data/cisco.csv")
write_csv(bp, "../data/bp.csv")
write_csv(shell, "../data/shell.csv")
write_csv(exxon, "../data/exxon.csv")
write_csv(chevron, "../data/chevron.csv")
write_csv(ridership, "../data/ridership.csv")
write_csv(mta, "../data/mta.csv")
write_csv(cta, "../data/cta.csv")
write_csv(rideshare_data, "../data/rideshare.csv")
write_csv(telecom_data, "../data/telecom.csv")
write_csv(oil_data, "../data/oil.csv")